An Effective Cost-Sensitive XGBoost Method for Malicious URLs Detection in Imbalanced Dataset
نویسندگان
چکیده
Imbalanced class has been a common problem encountered in the modeling process, and attracted more attention from scholars. Biased classifiers, which limit classifiers' performance for minority classes, will be produced if imbalanced ratio between number of positive labels negative is ignored. The synthetic over-sampling technique (SMOTE) very classic popular method, widely used to address this problem. However, SMOTE increases label noise training time during process. To improve detection rate classes while ensuring efficiency, we propose cost-sensitive XGBoost (CS-XGB) data CS-XGB method can reduce preference most without changing distribution original data. 600000 Uniform Resource Locators (URLs) were collected validate method. We compare (XGB), SMOTE+XGB CS-XGB, experimental results confirm that robust efficient cases.
منابع مشابه
Cost-sensitive decision tree ensembles for effective imbalanced classification
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on over...
متن کاملA cost effective and sensitive method for the determination of ammonia concentration in nanocrystal mordenite
The reduction capacity of ammonia while present even at ppm level can be demonstrated by the increased encounter probability between ammonia and methylene blue dye (MB+) incorporated in nanomordenite and Na-MOR zeolites. The rate of reduction methylene blue dye by ammonia on the surface nanomordenite zeolite is faster than Na-mordenite (Na-MOR) zeolite.Because nanomordenite zeolite with high si...
متن کاملA Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets
and Applied Analysis 3 costs for the positive and negative classes, SVM can be extended to the cost-sensitive setting by introducing an additional parameter that penalizes the errors asymmetrically. Consider that we have a binary classification problem, which is represented by a data set {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x l , y l )}, where x i ⊂ R represents a k-dimensional data point and ...
متن کاملA cost effective and sensitive method for the determination of ammonia concentration in nanocrystal mordenite
The reduction capacity of ammonia while present even at ppm level can be demonstrated by the increased encounter probability between ammonia and methylene blue dye (MB+) incorporated in nanomordenite and Na-MOR zeolites. The rate of reduction methylene blue dye by ammonia on the surface nanomordenite zeolite is faster than Na-mordenite (Na-MOR) zeolite.Because nanomordenite zeolite with high si...
متن کاملCost-Sensitive Detection of Malicious Applications in Mobile Devices
Mobile phones have become a primary communication device nowadays. In order to maintain proper functionality, various existing security solutions are being integrated into mobile devices. Some of the more sophisticated solutions, such as host-based intrusion detection systems (HIDS) are based on continuously monitoring many parameters in the device such as CPU and memory consumption. Since the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3093094